Inhomogeneous DNA: conducting exons and insulating introns 
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Parts of DNA sequences known as exons and introns play very different role in 
coding and storage of genetic information. Here we show that their conducting prop- 
erties are also very different. Taking into account long-range correlations among four 
basic nucleotides that form double-stranded DNA sequence, we calculate electron lo- 
calization length for exon and intron regions. Analyzing different DNA molecules, we 
obtain that the exons have narrow bands of extended states, unlike the introns where 
all the states are well localized. The band of extended states is due to a specific form 
of the binary correlation function of the sequence of basic DNA nucleotides. 

I. INTRODUCTION 

A DNA molecule is an exciting example of a natural complex system with intriguing 
properties. Many of these properties remain unexplained and need new approaches for 
further analysis. One of the fundamental questions is how information is transferred along 
a sequence of nucleotides. For example, if a mutation occurs in the sequence, it is usually 
healed. This means that some of physical parameters of the DNA molecule are sufficiently 
sensitive to detect this mutation. The length of a mutation is relatively short (~ 10 base 
pairs) as compared to the length of a gene (~ 10 3 — 10 6 base pairs). Because of small 
statistical weight of a mutation, the mechanical and thermodynamic characteristics are not 
sensitive enough for its robust detection. Unlike this, the electrical resistance of a DNA 
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molecule strongly fluctuates even if a single nucleotide in a long sequence is replaced (or 
removed) l|,|2j. This property is a signature of coherent transport that gives rise to universal 
fluctuations of conductance in mesoscopics samples 

In a DNA molecule the charge carriers move along a double-helix formed by two com- 
plementary sequences of four basic nucleotides: A, T, G, and C. A conduction band would 
form, if the DNA texts would exhibit some periodicity [4]. However, many studies of the 
DNA texts have revealed rich statistical properties but not the periodicity. One of the sug- 
gestions is that a DNA molecule is a stochastic sequence of nucleotides, the main feature of 
which is long-range correlations |5|. Therefore a popular method of detection of correlations 
is mapping of a DNA sequence onto a random walk. Long-range correlations are manifested 
then in an anomalous scaling of the generated classical diffusion 

Quantum transport through a DNA molecule is also strongly affected by the correlations. 
An uncorrelated sequence of nucleotides localizes all quantum electron states, as occurs in 
any ID white-noise potential, making impossible charge transfer at distances longer than 
the localization length 1(E). However, since most of the mutations in DNA are successfully 
healed, one may assume the existence of charge transport [7] through delocalized states that 
are responsible for the transfer of information at much longer distances. Such delocalized 
states are expected to exist within exons - the coding regions where the genetic information 
is stored. An important feature of charge transfer in carrying mutations exons was reported 
in Ref. [sj]. It was shown that cancerous mutations usually produce much less variation in 
the resistance than noncancerous ones. This apparent distinction shed light on the problem 
of survival of cancerous mutations. The healing of a mutation occurs only if it is detected 
by base excision repair enzymes. Since the detection of the mutation is most likely due to 
DNA-mediated charge transport {{J, it is clear that cancerous mutations, being "electrically 
masked," are very unlikely to be detected and then repaired. 

On the other hand, the introns - the long segments of DNA that apparently do not 
carry genetic code - may not contain delocalized states in the energy spectrum, thus re- 
maining insulators. In this Letter we give evidence for the validity of this hypothesis using 
a theoretical approach based on the results of electron localization in correlated disordered 
potentials. Our study of various DNA molecules shows that the energy spectrum of the 
exons indeed contains practically delocalized states. Unlike this, the electron wave functions 
are well-localized within the introns. 
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II. TWO-STRANDED MODEL OF DNA 

Let us first consider the widely used model for electron transport in DNA molecules, that 
is a discrete lattice with random on-site potential e n and site-independent nearest-neighbor 
hopping amplitude t, 

t(lp n+1 + V'n-l) = (Cn - E) lp n . (1) 

The energies e n are the ionization energies of the four nucleotides, €a = 8.24, ex = 9.14, 



ec = 8.87, and to = 7.75 eV, and the hopping amplitude t may vary from 0.1 to 1 eV [10 ]. 
Although the on-site energies in a sequence of coupled nucleotides do not coincide exactly 
with their ionization potentials, one may neglect this difference as it plays a minor role in 
our consideration. The regular periodic potential e n = V in Eq. ([T]) gives rise to Bloch 
functions ip n oc exp(ifin) with dispersion relation E — Vq = 2tcos/i. The allowed energies of 
these extended states lie in a single band of width 2t, \ E — Vq \ < t. In the opposite case of 
a white-noise potential, where (e^) = ej^^ and (e n ) = 0, all the states are localized. For 
weak fluctuations, <C t 2 , the Lyapunov exponent (inverse localization length) in the Born 



approximation is given by 
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7o(£) = t4t = iT^V- ( 2 ) 
lo{E) 8i 2 sm n 



In this approximation, the wave function extends over many sites, i.e., Iq(E) ^> 1, and the 
dispersion relation remains the same as for the regular potential, E — 2t cos \x. 

Most of the existing random potentials are neither ideally periodic nor ideally disordered 
(white-noise potential). They form a wide class of so-called correlated disordered potentials. 
A generalization of Eq. (j2J) for this class of potentials was obtained in Ref . [12j : 

oo 

7 (E) = 7 o(£M2/i), ip(fi) = 1 + 2^2 cos (fx k). (3) 

fe=i 

Here £(k) = (e„e n+fc )/eQ is the normalized binary correlator of the potential. Because of the 
correlations the energy spectrum may contain localized as well as extended states. In a first 
approximation over 6q the extended states occupy the intervals where the function (p(ii) in 
Eq. (J3J) vanishes. The regions of localized and extended states are separated by a " mobility 
edge." For example, a sharp vertical mobility edge at jj, — tt/3 (E — t) appears if the 
correlation function decays slowly and oscillates: = (3/2irk) sin(27r/c/3). In Refs. [3] 

this type of sharp mobility edge was observed in the transmission and reflection spectra of 
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FIG. 1: Binary correlation function of the sequence of nucleotides for Human BCRA gene. The 
correlation function drops from 1 at k = to about 0.1 at k > 1, left inset. Correlations extend 
to distances of a few thousands of base pairs, decaying very slowly. An important feature of this 
correlation function is close to regular oscillations about zero, right inset. 

a microwave waveguide with specially designed correlated scatterers. Power-law decay and 
oscillations of are the necessary (although not sufficient) attributes of a sharp mobility 
edge in the energy spectrum. We studied the correlation function of many different DNA 
sequences and all of them exhibit slow decay and oscillations. A typical correlation function 
is shown in Fig. [TJ A different approach to the Anderson transition in ID potentials with 
long-range correlations has been developed by Moura and Lyra [l^]. It is based on the 
method of generation of a random correlated sequence that is adopted from the theory of 
fractional Brownian motion. The presence or absence of a mobility ed ge i s determined by 
the scaling properties of power spectrum of binary correlation function 15( , but not by the 
binary correlator itself. 

Although Eq. ([3]) correctly accounts for the correlations in a single- channel random 
potential, it is not appropriate for the analysis of electron localization in real DNA. A DNA 
molecule is a two stranded sequence of nucleotides, i.e., there are two conducting channels. 
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It is known that the localization length strongly depends on the number of channels in 



disordered chains 
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17| . Our case is even more specific since two strands, being random in 
the longitudinal direction, exhibit regular A-T and C-G matching in the transverse direction. 
This key-to-lock matching between the strands strongly affects electron transport in DNA. 

To date, there have been a number of studies of the localization length in DNA molecules. 
In Ref. 18] an attempt was done to obtain numerically a localization-delocalization transi- 
tion in a single-stranded binary artificial DNA sequence with a special kind of slowly decaying 
correlations. However, since in the thermodynamic limit the proposed sequence turned out 
to be a regular one, the problem of extended states remains open. In the numerical study 
{19I ] an unexpected tendency to derealization with an increase of non-perturbative disorder 
in an uncorrelated single-stranded DNA was observed. Recently it was claimed that the 
transverse key-to-lock base pairing by itself gives rise to a band of extended states even if 
the longitudinal correlations are ignored 20] . This numerical result has since been criticized 
using analytical argumentation 211 ] . Thus, it is now clear that for a correct evaluation of 
the localization length in DNA one has to (i) use the two-stranded model; (m) avoid sim- 
plification of the 4- letters DNA alphabet to a binary sequence and; (Hi) account for the 
longitudinal correlations in both strands as well as transversal base pairing. 

In our study we use a two-channel model where the Schrodinger equation reads [li| 

t(il> 1>n +i + i>i, n -i) + hi) 2 , n = {E- e l n )il>i n , 

(4) 

Here ipi,n and ip2,n are the on-site wave functions in the first and second chain, respectively, 
and ei tn and e^ n are the on-site potentials. The hopping parameters t and h determine the 
inter- and intra-strand coupling. 

In the case of a periodic potential there are two momenta \L\ and \x-i for each energy E. 
They are given by two dispersion relations, E = 2t cos /ix,2 ± h. Here we consider the case of 
a band structure when the two propagating channels overlap. This happens if h < 2t. The 
band of allowed energies spreads from — 2t — h to 2t + h. Both channels are propagating (i.e. 
/ii and /i2 are real) for the energies | 2t — h \< E. 
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III. LOCALIZATION LENGTH 



For calculation of the localization length we use the perturbation theory approach devel- 



oped for two- and three-channel waveguide in Ref. 
by the following formula 



161 ]. The localization length is defined 



7 (£) = r\E) = - lim (InTr(tF)}, (5) 

where (...) denotes averaging over disorder and i is 2 x 2 transmission matrix. The trans- 
mission matrix i that enters into the Landauer formula g = (2e 2 /h)Tr(W) is calculated 
as a product of N on-site transfer-matrices. The transmission matrix is calculated in the 
linear (Born) approximation over weak disorder (ef), (e 2 ) <C t 2 . The results reported here 



are based on the following formula for the Lyapunov exponent 
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(6) 



sm /ix sm /x 2 sm /ii sm /x 2 
In the two-channel model electron localization occurs due to backscattering processes in 
both channels with intra-channel momenta transfers and 2/x 2 - There is also inter-channel 
scattering with momentum transfer Hi+^2- Accordingly, there are terms ifu(2fii), <^ 22 (2/i 2 ), 
and <^i 2 (/ii+/i 2 ) in Eq. ([6]). The functions are expressed through three binary correlators 
similarly to Eq. ([3]): 

CO 

(7) 



(fiji/J,) = 1 + 2 &j( k ) cos(/zfc) i,j 

k=l 



1,2. 



(8) 



These functions, £n, £ 22 and £i 2 , characterize the intra- and inter-channel correlations re 
spectively: 

( £ l,n £ l,n+k) = ( £ 2,n £ 2,n+k) = ^^22(^)5 

\ £ l,n £ 2,n+k) = £12^12 

Here the mean value £i 2 = (£i, n £2,n) can be either positive or negative, unlike always positive 
variances e\ 2 . Equation ([6]) is valid if both channels are propagating, i.e., the wave numbers 
Hi and /x 2 are real. If one of the channels becomes evanescent it is replaced by the eq. (33) 



from Ref. 
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l{E) 



32 sin 2 Hi 



£ 2 (^n(2/ii) + £2^22(2^1) + 2e 12 <Pi2(2hi) 



(9) 
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At the transition points when E = E c =\ 2t — h |, one of the denominators in Eq. (JH]) 
vanishes (sin /x 1)2 = 0) and the Born approximation fails. 

We apply Eq. (jSJ) to a two-stranded DNA molecule. Among a huge number of chemical 
and physical characteristics of a DNA molecule, we need here only the ionization potentials 
for each nucleotide, €a, £t, ec, and ea, and two hopping amplitudes, t and h. Unlike previous 
studies, we develop here an analytical two-channel approach, which accounts for intra- and 
inter-channel correlations. Therefore, we do not simplify a two-stranded DNA sequence to a 
binary sequence, using a coarse-graining procedure. From this point of view, our approach 
is much more close to reality. 

The length of a DNA sequence may reach 10 6 — 10 9 base pairs. In such a long disordered 
chain all the states are localized and a DNA molecule does not conduct. Much shorter 
segments may, however, exhibit very different behavior [h]]- This means that the con- 
ducting properties of a DNA molecule vary along the sequence of nucleotides and explains 
a wide spectrum of conducting properties obtained in experiments; see in the references in 
[lol ]. The physical characteristics, like the ionization potential and hopping amplitudes, are 
independent of the position of a given nucleotide in the sequence. The only characteristics 
which may change along the sequence are the correlation functions. The exons and introns 
store different kind of information and this affects the correlators. Thus, the localization 
length and the conductance of a given segment of a DNA molecule are directly related to the 
genetic information stored in this segment. Equation ([6]) is a mathematical manifestation 
of this fact. 



IV. NUMERICAL RESULTS FOR LOCALIZATION LENGTH 

In order to demonstrate the inhomogeneities in the conductivity of the DNA molecules 
we studied the localization length along the exons and introns. Exons are the parts where 
the genetic information is written and introns are the parts without apparent information 
for protein synthesis. The introns occupy a larger part of the DNA sequence of higher 
eukaryotes than do the exons. For procaryotes the situation is the opposite. From the point 
of view of "quality" of the carried information the introns and exons are the most different 
segments and one may expect very distinct localization properties to exist in these segments. 
It was recently shown that the melting of exons and introns also occurs in a different way 
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FIG. 2: Color online. Localization length vs energy for the Human BRCA gene measured in the 
number of base pairs. The length of the exon (intron) is 2120 (10421) base pairs(bps). The results 
for exon and intron are shown by black and grey (red) lines respectively. The values of the hopping 
parameters are h = 0.5 eV and t = 1 eV. The two channels are propagating if 6.6 < E < 10.4 
eV. One of the channels becomes evanescent in two symmetric regions, 10.4 < E < 11.4 eV and 
5.6 < E < 6.6 eV, of the width of 2ft, = 1 eV. 

23]. 

We use Eqs. fl6]) and (Q for numerical calculation of the localization length. Here 
we give the results for the following human DNA molecules: BRCA, ADAM10, SNAP29, 
and SUHW1. The results are shown in Figs. [2] - where we plot the localization length 
vs. electron energy for the exon and intron segments. The parameters of nucleotide site 
energies and the hopping amplitudes are the same for all these figures. Thus, the very 
different patterns shown in the figures represent different information codes in different 
DNA's. 

For most of the energies the localization length inside the exon region exceeds by order 
of magnitude the localization length inside the intron region. This confirms, by implication, 
the fact that very different kinds of information are coded in these regions. The vertical 
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FIG. 3: Color online. Localization length for the Human ADAM10 gene. The length of the exon 
(intron) is 1030 (31752) base pairs(bps). Inset shows the fine structure of one of the peaks. 
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FIG. 4: Color online. Localization length vs energy for the Human SNAP29 gene. The length of 
the exon (intron) is 2141 (21701) base pairs(bps). 
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FIG. 5: Color online. Localization length vs energy for the Human SUHW1 gene. The length of 
the exon (intron) is 1963 ( 4405) base pairs(bps). 

axis for each figure is cut off approximately at the length of the corresponding exon region. 
There are many peaks in the exon regions with the hight that exceeds much the vertical 
scale, i.e. the states within these peaks are extended. Unlike this, in the intron regions all 
the states are well-localized. The density of the peaks in Figs. [2] and [3] is much higher than 
that in Figs. H] and [5j Most of the peaks are situated in the region of energy where one of 
the channels is evanescent. Similar sharp peaks in the transmission of the exon regions of 



Y3 DNA have been numerically obtained in Ref. [24J for a single stranded DNA. It turns 
out that this feature is very robust since in that study a single-stranded model of DNA was 
used. 

The fine structure of one of the peaks is shown in the inset of Fig. [3j Since the peaks are 
of a finite width (~ 20 meV), they are narrow bands of extended states, but not the discrete 
resonant states predicted and observed in random dimers j^ . The nature of resonant 
tunnelling in random dimers is due to short-range correlations in contrast with specific 
long-range correlations which are necessary for existence of a continuous band of extended 
states. In the case of a single channel the width of the band of extended states can be 
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controlled by the parameters of the binary correlator £(k) in Eq. ([3]). In particular, wide 
and narrow bands of the extended states have been observed in the experiments with single- 
mode microwave waveguides 13j. For the two-channel system the relation between the 



positions of the mobility edges and the explicit form of the binary correlator is not known. 
One may expect that such relation is determined by the relative phase shifts between the 
Fourier components of the oscillatory correlators £ij(k). It is worth mentioning that short- 
and long-range correlations lead not only to different localization properties but also to very 
different classical as well as quantum diffusion in DNA 26]. 

A pattern 1(E) is a particular fingerprint of a given DNA sequence and it can be used, 
in principle, for classification of DNA molecules. In the previous studies (see, e.g., {(], 
the DNA sequences have been characterized by scaling exponent of the corresponding ran- 
dom walk. We consider that the inverse localization length (JH]) is more convenient since 
it characterizes a well defined physical property - electrical resistivity. Moreover, Eq. (jH]) 
establishes a qualitative relation between the localization length and the informational char- 
acteristic (binary correlators) of the DNA sequence. At the same time the binary correlators 
by themselves are not very illustrative. In particular, the plots of the correlators for exon 
and intron regions look very similar, see Fig. [61 although these plots, of course, contain the 
same information about the DNA sequence as the plots for the Lyapunov exponents do. It 
is clear that the presence or absence of the bands of the extended states is determined by 
subtle interference among the Fourier harmonics of the functions <Pij([i) given by Eq. (p^. 

The Lyapunov exponent (jSJ) depends on €a, ct, £c, and ec as well as on the hopping 
amplitudes t and h. Since the values of t and h are not well established experimentally, we 
repeated the calculations for different values of the hopping amplitudes, 0.1 < h < 0.5 and 
0.7 < t < 1. Since our analytical approach is valid only in the region where the perturbation 
parameters e\/t and e^jt are small, we cannot extrapolate our results to the region where 
t < 0.5. The patterns for the Laypunov exponents do not change essentially with variation 
of the parameters t and h. The delocalized states do not disappear but the positions of the 
mobility edges are slightly displaced. 
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FIG. 6: Color online. Binary correlator for the exon (left panel) and intron (right panel) regions 
of the Human SNAP29 gene. Inserts show local behavior of the correlators within small intervals 
of k. 

V. CONCLUSIONS 



In our study of the double-stranded model of DNA we observed much longer localization 
length in exon than in intron regions for practically all the allowed energies and for all ran- 
domly selected DNA sequences. Through statistical correlations of the nucleotide sequence 
making up a DNA molecule, we relate this persistent difference to qualitatively different 
information stored by exons and introns. 

For each DNA the pattern is unique fingerprint and can be used for identifica- 

tion of DNA's. All presented results confirm the suggestion that the localization length in 
DNA is determined by specific long-range correlations between the nucleotides and not by 
a particular choice of control parameters of the model. 

The conducting properties of DNA have attracted much attention since DNA may be used 



in electronic devices 
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29|. We hope that our approach and results can be very useful for 



further theoretical and experimental studies of the electrical and optical properties of DNA. 
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